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Abstract 

In  this  paper,  we  analyze  the  statistics  of  two  general  classes  of  statistics.  The  first  class  is  “M 
quadratic  and  linear  forms  of  correlated  Gaussian  random  variables” .  Examples  include  both  cyclic 
and  non-cyclic  autocorrelation  function  (ACF)  estimates  of  a  correlated  Gaussian  process  or  the 
magnitude-squared  of  the  output  samples  of  a  filtered  Gaussian  process.  The  second  class  consists 
of  a  subset  of  order  statistics  together  with  a  remainder  term.  An  example  is  the  largest  M  —  1  bins 
of  a  discrete  Fourier  transform  (DFT)  or  discrete  wavelet  transform  (DWT),  together  with  the  sum 
of  the  remaining  energies,  forming  an  M-dimensional  statistic.  Both  classes  of  statistics  are  useful 
in  classification  and  detection  of  signals.  In  this  paper,  we  solve  for  the  joint  probability  density 
functions  (PDFs)  of  both  classes.  Using  the  PDF  projection  method,  these  results  can  be  used  to 
transform  the  feature  PDFs  into  the  corresponding  high-dimensional  PDFs  of  the  raw  input  data. 


1  Introduction  and  Motivation 


The  so-called  M- ary  classification  problem  is  that  of  assigning  a  multidimensional  sample  of  data 
x  e  7Zn  to  one  of  M  classes.  The  statistical  hypothesis  that  class  j  is  true  is  denoted  by  Ilj . 
1  <  j  <  M.  The  statistical  characterization  of  x  under  each  of  the  M  hypotheses  is  described 
completely  by  the  joint  PDFs,  written  p(x.\Hj),  1  <  j  <  M.  Classical  theory  applied  to  the  problem 
results  in  the  so-called  Bayes  classifier,  which  simplifies  to  the  Neyman-Pearson  rule  for  equi-probable 
prior  probabilities,  namely, 


j*  =  argmax  p(x\Hj). 


(1) 


Because  this  classifier  attains  the  minimum  probability  of  error  of  all  possible  classifiers,  it  is  the  basis 
of  most  classifier  designs.  Unfortunately,  it  does  not  provide  simple  solutions  to  the  dimensionality 
problem  that  arises  when  the  joint  PDFs  are  unknown  and  must  be  estimated.  The  most  common 
solution  is  to  reduce  the  dimension  of  the  data,  by  extraction  of  a  small  number  of  information-bearing 
features  z  =  T(x),  and  then  re-casting  the  classification  problem  in  terms  of  z: 


j*  =  argmax  p(z\Hj). 

‘This  work  was  supported  by  the  Office  of  Naval  Research. 
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To  be  optimal,  the  feature-based  classifier  (2)  requires  that 


Pi*\Hj)  =  Piz\Hj)  (31 

p{*\Hk)  p{z\Hk) 

for  any  two  classes  j.  k.  Thus,  the  feature  space  z  must  optimally  separate  any  pair  of  classes.  This 
requirement  is  only  achieved  in  the  simplest  of  problems.  In  trying  to  achieve  (3),  we  encounter  a 
fundamental  tradeoff  -  whether  to  discard  features  in  an  attempt  to  reduce  the  dimension  to  something 
manageable  -  or  to  include  them  and  suffer  the  problems  associated  with  estimating  a  joint  PDF  with 
high  dimensionality.  Unfortunately,  there  may  be  no  acceptable  solution  where  there  is  both  adequate 
information  content  in  z  and  low  enough  dimension  for  robust  PDF  estimation.  Virtually  all  methods 
which  attempt  to  find  decision  boundaries  on  a  high-dimensional  space  are  subject  to  this  tradeoff 
or  “curse”  of  dimensionality.  For  this  reason,  many  researchers  have  explored  the  possibility  of  using 
class-specific  features. 

The  basic  idea  in  using  class-specific  features  is  to  extract  M  class-specific  feature  sets,  z?  = 
Tj(x),  1  <  j  <  M,  where  the  dimension  of  each  feature  set  is  small,  and  then  to  arrive  at  a  decision 
rule  based  only  upon  functions  of  the  lower-dimensional  features.  Unfortunately,  the  classifier  modeled 
on  the  Neyman-Pearson  rule, 

j*  =  argmax  p(zj\Hj),  (4) 

3 

is  invalid  because  comparisons  of  densities  on  different  feature  spaces  are  meaningless.  A  number  of 
approaches  have  emerged  in  recent  years  to  arrive  at  meaningful  decision  rules  [1]  [2]  [3]  [4]  [5]  [6]. 
All  these  methods  are  based  on  strong  assumptions,  so  are  not  general  solutions.  The  class-specific 
method  [7],  however,  is  an  optimal  classifier  based  on  a  sufficiency  assumption  much  milder  than  the 
standard  feature-based  classifier  and  is  otherwise  completely  general.  Provided  a  suitable  reference 
hypothesis  Hqj  can  be  found  for  each  class  such  that  p(x|i?oj)  and  p(zj\Hqj)  are  both  known,  the 
projected  PDF  of  x  may  be  constructed: 


P(x \Hj) 


A 


P{*\Ho,j) 

P(zj\H0,j) 


p(zj\Hj)- 


(5) 


According  to  the  PDF  projection  theorem  [7],  [8],  the  function  p(x\Hj)  is  guaranteed  to  be  a  PDF, 
and  therefore  is  a  PDF  approximation  to  p(x.\Hj).  Accordingly,  the  class-specific  classifier 


j*  =  arg  max 
3 


P{*\H0,j) 

P(zj\Ho,j) 


Pizj\Hj) 


(6) 


can  be  constructed.  The  accuracy  of  the  approximation  depends  only  upon  the  statistical  sufficiency  of 
the  class-specific  feature  set  z j  in  separating  Hj  from  Hqj.  More  precicely,  the  class-specific  classifier 
requires  for  optimality  that 

Pi*\Hj)  =  P(zj\Rj)  (7] 

P{x\Ho,j)  Pizj\R0,j) 

for  each  class  j.  Thus,  the  feature  space  z j  must  optimally  separate  a  given  class  Hj  from  the  hand¬ 
picked  reference  hypothesis  Hqj.  This  sufficiency  requirement  is  far  more  manageable  than  (3)  and 
is  helped  by  the  fact  that  while  {Hj}  are  given  and  cannot  be  chosen,  the  class-dependent  reference 
hypotheses  {Hqj}  can  be  individually  chosen. 

The  success  of  the  method  depends  on  being  able  to  calculate  the  “correction  terms” 


Q(x,  Tj,  Hqj) 


A  ptx\Ho,j) 
Pizj\Ho,j)' 
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which  we  call  “Q”  functions.  The  numerator  is  often  quite  easy  to  write  down,  but  the  denominator 
is  difficult  to  derive.  That  is  the  subject  of  this  paper  -  the  calculation  of  these  denominator  terms 
under  a  special  reference  hypothesis  -  the  white  Gaussian  noise  (WGN)  hypothessis.  A  wide  variety 
of  solutions  are  already  available  where  the  standard  normal  distribution,  or  WGN  hypothesis,  is 
used  for  the  reference  hypothesis  [9].  These  include  cyclic  cepstrum  and  autocorrelation  estimates 
from  independent  Gaussian  noise  samples.  Through  one-to-one  transformation,  these  results  can  be 
applied  as  well  to  linear  prediction  (LPC)  and  reflection  coefficients.  In  this  paper,  we  add  solutions 
for  two  new  classes  of  statistics  to  those  already  available. 

2  M  Quadratic  and  Linear  Forms  of  Correlated  Random  Variables 

The  second-order  statistics  of  correlated  Gaussian  random  variables  (RVs)  constitute  an  important  set 
of  statistics.  Examples  include  the  ACF  estimates  of  a  correlated  Gaussian  process  or  the  magnitude- 
squared  output  samples  of  a  linear  filter.  Applications  exist  in  SONAR  and  RADAR  detection  and 
estimation  problems.  Examples  include  both  cyclic  and  non-cyclic  ACF  estimates  of  dependent  Gaus¬ 
sian  noise  samples. 

2.1  Form  of  the  statistics 

The  general  form  of  the  statistics  of  interest  is 

z  =  T(x)  =  [Zl  z2...  zM]\ 

where  {zm}  are  M  quadratic  forms, 

zm  =  x'Pmx  +  p^x  +  qm,  1  <m<  M,  (8) 

and  x  is  the  AT-by-1  real  input  data  vector 

x  =  [xi  x2  ■  ■  ■  xN]\ 

{Pm}  are  M  real  symmetric1  AT-by-AT  matrices,  {Pm}  are  AT-by-1  vectors,  and  {//,„}  are  scalars. 

The  challenge  is  to  determine  the  joint  PDF  of  z  under  a  specified  Gaussian  hypothesis,  that  is, 

P{ Z)  Rti  Pxi  {Era}i  {Pm}j 


where  AT  x  1  mean  vector 

E(x)  =  nx 

and  N  x  N  covariance  matrix 

E((x  -  /x)(x  -  n)')  =  Rx. 

Example  1  Autocorrelation  Function.  An  example  of  a  class  of  statistics  which  are  of  the  form 
(8)  are  ACF  estimates 

1  N 

n  =  T7  X  Xi  Xi-ti  0  <  t  <  N  —  1.  (9) 

iv  i=t+ 1 

1  There  is  no  loss  of  generality  in  assuming  {Pm}  are  symmetric,  because  any  anti-symmetric  component  of  {Pm}  will 
cancel  out  in  the  quadratic  form. 
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0  0  1  0  •  ■  ■ 

0  0  0  1  ■  ■  ■ 

1  0  0  0  -  -  - 

0  1  0  0  ••• 

and  so  forth.  The  pattern  is  such  that  P&  is  nonzero  only  on  the  super-  and  sub- diagonals  spaced  k 
away  from  the  main  diagonal. 

If  the  sample  mean  is  subtracted  from  x  prior  to  calculation  of  the  ACF  estimates,  the  quadratic 
forms  (8)  still  hold,  but  the  elements  of  { }  are  changed.  For  example,  the  j,  k-th  element  of  Po  is 
now  5jk  —  1  /N  instead  of  8jk,  where  8jk  is  the  Kronecker  delta;  the  remaining  matrices  {P&}  are  more 
complicated,  but  each  element  in  the  matrices  can  be  evaluated  by  means  of  a  single  sum,. 

Equation  (9)  involves  an  aperiodic  correlation  of  data  x.  The  extension  to  cyclic  correlation  es¬ 
timates  can  also  be  formulated  in  terms  of  quadratic  forms  by  wrapping  each  of  the  diagonals.  For 
example,  for  N  =  6,  P2  becomes 

OOIOIO 
0  0  0  1  0  1 

1  0  0  0  1  0 

0  1  0  0  0  1 

10  10  0  0 

0  10  10  0 

Cross- correlations 

zm  =  u'Pmv,  1  <  m  <  M,  (11) 
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can  also  be  written  as  (8)  if  we  define 


Example  2  Linear  Filtering.  Let  y  be  an  M-by-1  output  vector  from  a  linear  filtering  operation 

y  =  Ax, 

where  A  is  an  M-by-N  filter  matrix  and  x  is  a  real  N-by-1  input  vector.  Let  z  be  the  M-by-1  vector 
of  squared  values  of  the  elements  of  y.  For  example,  let 

02  a\  ao  0  0  0 

0  02  ai  ao  0  0 

0  0  02  ai  ao  0 

0  0  0  02  ai  ao 

Then,  we  may  write  the  elements  of  z  as 

zm  =  x'  A'  Um  Ax,  1  <  m  <  M, 

where  Um  is  an  M-by-M  matrix  of  all  zeros  except  a  single  one  on  the  main  diagonal  in  location 
m,  m. 

2.2  Compression  of  Parameters 

Note  that  there  is  no  loss  of  generality  in  assuming  that  R,  —  In  and  pLx  =  0,  where  I  at  is  the  N-by-N 
identity  matrix  and  0  is  the  iV-by-1  vector  of  zeros.  This  is  because  we  can  write 

P{z- R-xj  Pxi  {fm}i  {Pm}i  {Qm})  = 

P{z i  O5  {Pm},  {Pm},  {%}), 

where 

Pm  =  CPm  C', 

Pm  =  C  pm  +  2CP miax, 

Qm  =  Qm+  P  'mVx  +  VxFmP'xi 

and  C  is  the  Cholesky  decomposition  of  R,  , 

Rx  =  C'  C. 


2.3  Saddlepoint  Approximation 

Since  no  closed-form  expression  for  the  joint  PDF  of  z  in  (8)  is  known,  we  apply  the  Saddlepoint 
approximation  [10], [9].  To  obtain  the  SPA,  we  need  the  joint  cumulant  generating  function  (CGF)  of 
z,  namely, 

cz{  A)  =  log  (A), 

where  gz(X)  is  the  joint  moment-generating  function  (MGF)  of  z.  Also,  we  need  the  first  and  second- 
order  partial  derivatives  of  cz{ A).  Once  these  are  known,  the  formulas  in  reference  [9]  may  be  used  to 
obtain  the  SPA. 
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It  is  shown  in  [11]  that 


where 

with 

and 


c*(A)  =  ~  log  |Q(A)|  +  ^t'(A)  Q  1(A)t(A)  +«(A), 


Q(A)  =  Im  —  2D(A), 

M  M 

D(A)  ±  Y,  *(A)  =  E  X™P m, 

m=l  m=  1 


M 


*(A)  =  E  XmQm- 


m= 1 


The  first-order  partial  derivatives  are 


_T_  c 

d\ m  Cz 


(A)  =  tr{Q  1Pm}  +  p^Q  H  +  t'BmQ  xt, 


for  1  <  m  <  M,  and  the  second-order  partial  derivatives  are 

92  cz( A)  =  2  tr  {B/Bto}  +  pJQ_1pm 

+2p[BmQ“1t  +  2p[nB;Q_1t 


d\,d\„ 


(13) 


+4trB;BmQ  1t 

where 

Bm(A)  =  Q-1(A)Pm, 

and  we  drop  the  (A)  dependence  from  t(A),  Q_1(A),  and  Bm(A),  for  simplicity.  The  third  and  fourth 
derivatives,  necessary  for  the  first-order  correction  term  of  the  SPA  have  also  been  worked  out  [11]. 
The  equations  simplify  considerably  if  we  assume  that  {pm}  and  { qrn }  are  all  zero.  We  then  have 

cz(A)  =  log  (A)  =  —  ^  log  i  Q  ( A)  |  .  (14) 

The  first  order  partial  derivatives  reduce  to 

aW  C^(A)  =  tr{Bm} 

and  the  second  order  partial  derivatives  become 

<9 A;d A™  C^(A)  =  2tr{B^Bm},  1  <Z,m<M. 

Example  3  Autocorrelation  Function.  We  revisit  example  1  and  test  the  SPA  for  cyclic  ACF 
estimates  against  the  SPA  solution  in  [9].  We  used  the  cyclic  ACF  (e.g.  equation  10)  with  N  =  32 
and  M  =  3.  The  results  are  shown  in  Figure  1.  The  two  methods  agree  very  closely.  The  differences 
are  so  small  that  they  can  be  explained  by  differences  in  the  stopping  point  of  the  iteration  to  find  the 
saddlepoint. 
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Log— PDF  (From  Prior  Work) 


Figure  1:  Comparison  of  Saddlepoint  Approximation  from  previous  work  (x-axis)  with  method  of  this 
section  for  cyclic  ACF  estimates.  The  largest  difference  was  .021 


Example  4  Linear  Filtering.  We  revisit  example  2  and  test  the  SPA  solution  against  a  Gaussian 
mixture  approximation.  We  used  a  filter  length  of  3  as  shown  in  equation  (12)  and  a  feature  size  of 
M  =  3.  The  filter  coefficients  were  a  =  [1  1  1].  A  Gaussian  mixture  approximation  of  p( z)  under 
the  WGN  assumption  was  obtained  using  5000  samples  and  20  mixture  components.  Next,  100  new 
samples  were  generated  and  the  log  PDF  from  the  SPA  was  compared  with  the  mixture  approximation. 

The  results  of  the  experiment  are  shown  in  Figure  2.  On  the  graph,  the  dots  represent  the  method 
of  this  section  compared  with  the  Gaussian  mixture.  There  is  a  positive  bias  of  about  .5  associated  with 
the  SPA.  This  bias  can  be  attributed  to  that  fact  that  the  statistics  are  highly  non- Gaussian.  There 
are  two  things  to  note  about  this  bias.  First,  the  accuracy  of  the  SPA  depends  only  upon  the  shape  of 
the  joint  MGF  in  the  vicinity  of  the  saddlepoint,  not  upon  its  magnitude.  Thus,  the  errors  tend  not 
to  increase  as  we  go  into  the  tails  of  the  PDF.  This  is  in  constrast  to  the  central-limit  theorem  (CLT) 
approximation  which  tends  to  have  extremely  large  errors  in  the  tails  of  the  PDF.  Second,  the  error 
of  .5  that  we  noted  means  an  error  o/exp(.5)  =  1.65,  or  a  65  percent  error  in  the  PDF  value.  While 
this  may  seems  to  be  large,  consider  that  this  will  be  the  error  of  the  projected  raw  data  PDF  (using 
equation  5),  which  is  quite  small  for  a  high-dimensional  PDF.  When  working  with  high-dimensional 
PDFs,  likelihood  values  vary  over  extremely  wide  ranges.  PDF  accuracy  values  are  better  thought  of 
in  terms  of  the  error  in  the  log-PDF  per  dimension. 

To  test  the  hypothesis  that  this  bias  is  associated  with  the  shape  of  the  MGF  in  the  vicinity  of  the 
saddlepoint,  we  implemented  the  first-order  correction  term.  The  first-order  correction  term  involves 
the  third  and  fourth- order  partial  derivatives  of  the  joint  CGF  and  amounts  to  an  additional  term  in 
the  Taylor  series  expansion  which  gives  rise  to  the  SPA.  This  term  has  been  worked  out  by  Nuttall  for 
the  the  statistics  of  general  form  (8)  [11],  When  the  first-order  correction  term  was  added,  the  bias 
disappeared  (circles  on  the  graph).  Note  that  the  first-order  correction  term  is  often  computationally 
impractical  for  large  M .  Since  small  bias  in  the  SPA  is  not  significant  in  the  context  of  equation  (5), 
adding  the  correction  term  may  not  be  worth  the  extra  computational  effort. 
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Figure  2:  Comparison  of  Saddlepoint  Approximation  from  reference  with  a  Gaussian  mixture  approx¬ 
imation.  Dots  are  without  first-order  correction  term,  circles  with  correction  term. 


3  Order  Statistics  with  Residual  Energy. 

3.1  Features 

Let  x  =  [xyX‘2  ■  ■  ■  x i\r ]  be  a  set  of  N  independent  random  variables  (RVs)  distributed  under  hypothesis 
Hq  according  to  the  common  PDF  po(x).  The  joint  probability  density  function  (PDF)  of  x  is 

N 

p(xi-ff0)  =  n  po(®0- 

i=  1 

Now  let  Xi  be  ordered  in  decreasing  order  into  the  set  y  =  [yij/2  ■  ■  -Un]  where  yt  >  yt+i-  We  now 
choose  a  set  of  M  —  1  indexes  ti,t2  ■  ■  ■  tM-h  with  l  <  t\  <  t2  <■■  ■  Im- i  <  N  to  form  a  selected 
collection  of  order  statistics  yt±  ■  yt2  ■  ■  ■ ytM-i  ■  ^o  this  set,  we  add  the  residual  “energy”, 

r=  Y,  h(y^-'  (15) 

jEM 

where  the  set  Ai  are  the  integers  from  1  to  N  not  including  the  values  C,  C  •  •  •  and  h{x)  is  a 
function  which  is  needed  to  insure  that  r  has  units  of  energy.  We  than  form  the  complete  feature 
vector  of  length  M  ( M  >  2): 

z  =  ivti  yt-2  ■  ■  ■  vtM-i r]'- 

By  appending  the  residual  energy  to  the  feature  vector,  we  insure  that  z  is  a  sufficient  statistic  for 
unknown  scale  factors  applied  to  x.  We  consider  two  important  cases: 

1.  Let  x  be  a  set  of  magnitude-squared  DFT  bin  outputs,  which  are  exponentially  distributed. 
Since  x  is  already  in  units  of  energy,  h(x)  =  x. 

2.  Let  x  be  a  set  of  absolute  values  of  zero-mean  Gaussian  RVs.  Then,  h(x)  =  x2. 
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3.2  Integral  Solution 

3.2.1  Probability  Density  Function 

Interestingly,  the  joint  PDF  of  z  can  be  reduced  to  a  single  one-dimensional  integral  expression  [12]. 
Define 


U 


c(u,  A)  = 

J 

/  po(a:)  exp  (Ah  (a:))  da: 

’  —  OO 

(16) 

e(u,  A)  = 

POO 

/  Pol®)  exp(Ah(a:))  dx 

(17) 

U 


We  find  the  joint  PDF  of  z  to  be 

Pz(Z)  =  m  {rim^l  Po{zm)} 


where 


Re  jexp  [-(A  +  iy)zM]  7(A  +  *y)|  dy, 

(M- 2  'j 

D  =  (h  -  1)!  n  (Wi  -  trn  -  1)!  (N  -  tM- 1)!* 
I  m=  1  J 

/(A)  = 


{n"1  [e(^m+i5  A)  —  e(zm,  A)]tm+1  1}. 


3.2.2  Example  1 

Let  xn  follow  the  standard  exponential  distribution  po(x)  =  exp(— x)  for  x  >  0.  Let  h(x)  =  x.  Then, 
for  Re(A)  <  1, 

e(u,  A)  =  -  e(x-1')u  for  u  >  0;  -  for  u  <  0. 

1  —  A  1  —  A 

c(u.  A)  =  -  fl  —  for  u  >  0;  0  for  u  <  0. 

1  —  A  '  ’ 


3.2.3  Example  2 

Let  xn  =  \gn\  where  gn  follows  the  standard  normal  distribution  1V(0, 1).  Let  h(x)  =  x2.  Then,  for 
Re(A)  < 

r  oo 

e(u,  A)  =  /  exp(— a;2/2)  exp(Aa;2)  dx 


=  ^(-Wl  -  2A)  for  u  >  0; 

vra  for  u  <  °- 

Also, 

c(u.  A)  =  1-2^L-mV'i-2A)  for  u  >  0;  0  for  w  <  0. 
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3.3  SPA  Solution  for  Exponential  RVs 

We  now  consider  the  first  case  (exponential  RVs)  and  apply  the  SPA.  This  will  provide  a  means  of 
validating  the  integral  solution  of  the  previous  section.  As  we  explained  in  Section  2.3,  to  obtain  the 
SPA,  we  need  the  joint  CGF  and  its  first  and  second-order  partial  derivatives.  Once  these  are  known, 
the  formulas  in  reference  [9]  may  be  used  to  obtain  the  SPA. 


3.3.1  Joint  MGF  of  y. 

We  start  with  the  joint  MGF  of  y,  which  is  (see  [13],  p.  68,  eq.  B-18) 

<7o(«i,  «2  . . .  Otv)  = 

_ l _ 

(l-ai)(l-5n+5^)(i-al+a32+“3)...(i_al+a2+-+^Zy 

for  Re(ai)  <  1,  Re(ai  +  02)  <  2,  ...  ,  Re(ai  +  «2  +  -  -  -  +  otv)  <  A".  We  can  rewrite  (18)  as 

1 


(18) 


3o(«l,  «2  •  •  •  <F/v)  = 


Un=l  0n(«  15  «2  •  •  •  «/v)  ’ 


where 


Alternatively, 


where 


^„(ai,«2  ■  •  •  oat)  =  1 - y ]ap,  1  <n<N 

n. 


p= 1 


N 


(fin (  U  i ,  a 2  •  •  •  ayy)  —  1  ^  ]  Qnp  Otpt  1  V  71  V  iV, 

P=  1 


(  1 

n 


Qnp 


=  < 


-  for  1  <  p  <  n 


>  ,  n  —  1, 2  . . .  N. 


0  for  n  <  p  <  N 
Define  the  iV-by-iV  matrix  Q  =  [qnp]  and  a  =  [a\  a.2  •  •  •  ayv]/-  Then, 


0(a)  = 


0i(«) 

02(a) 

0jv(a) 


=  I  -  Qa, 


where  1  is  an  iV-bv-1  column  vector  of  ones.  Thus,  fjo  { a )  is  the  reciprocal  of  the  product  of  the 
elements  of  0(a),  denoted  by 

'Ma]  =  prod(l-Qa)’  (19) 

where  prod(  )  is  the  product  of  the  elements  of  the  argument. 
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3.3.2  Joint  MGF  of  z. 

The  joint  MGF  of  z  is,  for  A  =  [A1A2  . . .  A m]\ 

9z{  A)  =  E{exp(A'z)} 

=  E{exp(Aiytl  +  A2yt2  +  (20) 


•  •  •  +  A M-iVtu-i  +  Amt)}  . 


This  can  be  written 

9zW  =  do  (A  A) 

where  A  is  the  N-by-M  matrix  that  has  l’s  everywhere  in  the  M- th  column  except  for  0’s  in  rows 
ti,t2  ■  ■  ■  tM- 1,  and  A  has  l’s  in  row  G,  column  1;  row  t2,  column  2;  etc.  Therefore,  from  (19), 

^  ^  prod(.l  —  QAA) 

Note  that  Q  can  be  a  large  N-by-N  matrix  if  N  is  large.  However,  if  we  define 

P  =  QA, 


P  is  a  reasonable  N-by-M  size  matrix  and  can  be  easily  formed  directly.  The  final  simplified  form  for 
the  joint  MGF  is 


9zW 


1 

prod(l  —  PA)  ’ 


3.3.3  Partial  Derivatives  of  the  CGF. 

To  obtain  the  SPA  to  the  PDF  of  z,  we  need  the  joint  cumulant  generating  function  (CGF)  cz(A)  of 
z  and  its  partial  derivatives.  The  joint  CGF  is  defined  by 

Cz{ A)  =  log(gz(A))  =  -sum  (log(l  -  PA)) 

where  log  is  the  vector  log  function  which  operates  on  each  element  of  its  argument  and  sum(  )  is 
the  vector  sum,  which  adds  up  all  the  elements  of  the  argument.  If  we  define  1  (A)  as  the  element- 
by-element  reciprocal  of  1  —  PA,  and  3? (A)  as  the  diagonal  N-by-N  matrix  with  elements  equal  to 
the  elements  of  X  —  PA,  it  is  straight-forward  to  show  that  the  gradient  vector  of  cz(A)  is  the  M- by-1 
vector 

<5(A)  =  ^  cz(A)  =  P'  <£_1(A), 
and  the  M-by-M  Hessian  matrix  of  cz(A)  is 

C*(A)  =  S^A7  C^(A)  =  P'  #"2(A)  P’ 


3.4  Comparison 

A  good  check  on  the  analysis  is  to  compare  the  two  methods  just  presented  in  Sections  3.3  and  3.2. 
We  tried  the  case  of  IV  =  100,  M  =  3,  ti  =  3,  f2  =  7.  Samples  of  input  data  vector  x  were  generated 
using  N  independent  uniformly  distributed  RVs  (in  the  range  0  to  1),  then  scaled  by  a  random  scale 
factor  in  the  range  0  to  10.  Note  that  the  PDF  used  for  data  generation  is  not  important  because 
we  are  comparing  two  PDF  approximations  for  the  same  input  feature  vector.  The  PDF  of  z  was 
computed  using  the  methods  of  Sections  3.3  and  3.2.  The  results  are  shown  in  Figure  3.  The  two 
methods  agreed  very  closely  -  within  a  maximum  error  of  .059  in  log  space. 
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Figure  3:  Comparison  of  Saddlepoint  Approximation  (Section  3.3)  with  integral  solution  (Section  3.2). 
Largest  difference  was  .059 


4  Conclusions 

In  this  paper,  we  have  derived  saddlepoint  approximations  to  the  multidimensional  PDFs  for  two 
general  classes  of  features.  The  availability  of  these  PDFs  permits  these  features  to  be  used  in  a 
class-specific  classifier.  The  PDFs  were  checked  using  numerical  simulations. 


References 

[1]  A.  Mashoshin,  “Synthesis  of  algorithms  for  the  classification  of  underwater  objects  from  their 
underwater  sound  field,”  Acoustical  Physics ,  vol.  42,  no.  3,  pp.  347-351,  1996. 

[2]  S.  Kumar,  J.  Ghosh,  and  M.  Crawford,  “A  versatile  framework  for  labeling  imagery  with  large 
number  of  classes,”  in  Proceedings  of  the  International  Joint  Conference  on  Neural  Networks , 
(Washington,  D.C.),  1999. 

[3]  S.  Kumar,  J.  Ghosh,  and  M.  Crawford,  “A  hierarchical  multiclassifier  system  for  hyperspectral 
data  analysis,”  in  Multiple  Classifier  Systems  (J.  Kittler  and  F.  Roli,  eds.),  pp.  270-279,  Springer, 
2000. 

[4]  H.  Watanabe,  T.  Yamaguchi,  and  S.  Katagiri,  “Discriminative  metric  design  for  robust  pattern 
recognition,”  IEEE  Trans.  Signal  Processing ,  vol.  45,  no.  11,  pp.  2655-2661,  1997. 

[5]  P.  Belhumeur,  J.  Hespanha,  and  D.  Kriegman,  “Eigenfaces  vs.  fisherfaces:  Recognition  using  class 
specific  linear  projection,”  PAMI ,  vol.  19,  pp.  711-720,  July  1997. 

[6]  D.  Sebald,  “Support  vector  machines  and  the  multiple  hypothesis  test  problem,”  IEEE  Trans. 
Signal  Processing ,  vol.  49,  pp.  2865-2872,  November  2001. 

[7]  P.  M.  Baggenstoss,  “A  theoretically  optimum  approach  to  classification  using  class-specific  fea¬ 
tures.,”  Proceedings  of  ICPR,  Barcelona ,  2000. 


12 


[8]  P.  M.  Baggenstoss,  “A  modified  Baum-Welch  algorithm  for  hidden  Markov  models  with  multiple 
observation  spaces.,”  IEEE  Trans.  Speech  and  Audio ,  pp.  411-416,  May  2001. 

[9]  S.  M.  Kay,  A.  H.  Nuttall,  and  P.  M.  Baggenstoss,  “Multidimensional  probability  density  func¬ 
tion  approximation  for  detection,  classification  and  model  order  selection,”  IEEE  Trans.  Signal 
Processing ,  Oct  2001. 

[10]  0.  E.  Barndorff-Nielsen  and  D.  R.  Cox,  Asymptotic  Techniques  for  Use  in  Statistics.  Chapman 
and  Hall,  1989. 

[11]  A.  H.  Nuttall,  “Saddlepoint  approximation  and  first-order  correction  term  to  the  joint  probability 
density  function  of  M  quadratic  and  linear  forms  in  K  Gaussian  random  variables  with  arbitrary 
means  and  covariances,”  NUWC  Technical  Report  11262 ,  December  2000. 

[12]  A.  H.  Nuttall,  “An  integral  solution  for  the  joint  PDF  of  order  statistics  and  the  residual  sum,” 
NUWC  Technical  Report  (to  be  published j,  October  2001. 

[13]  A.  H.  Nuttall,  “Detection  performance  of  generalized  likelihood  ratio  processors  for  random  signals 
of  unknown  location,  structure,  extent,  and  strength,”  NUWC  Technical  Report  10739 ,  August 
1994. 


13 


